Analysis-ready satellite data mosaics from Landsat and Sentinel-2 imagery

Today's enormous amounts of freely available high-resolution satellite imagery provide the demand for effective preprocessing methods. One such preprocessing method needed in many applications utilizing optical satellite imagery from the Landsat and Sentinel-2 archives is mosaicking. Merging hundreds of single scenes into a single satellite data mosaic before conducting analysis such as land cover classification, change detection, or modelling is often a prerequisite. Maintaining the original data structure and preserving metadata for further modelling or classification would be advantageous for many applications. Furthermore, in other applications, e.g., connected to land cover classification creating the mosaic for a specific period matching the phenological state of the phenomena in nature would be beneficial. In addition, supporting in-house and computing centers not directly connected to a specific cloud provider could be a requirement for some institutions or companies. In the current work, we present a method called Geomosaic that meets these criteria and produces analysis-ready satellite data mosaics from Landsat and Sentinel-2 imagery.• The method described produces analysis-ready satellite data mosaics.• The satellite data mosaics contain pixel metadata usable for further analysis.• The algorithm is available as an open-source tool coded in Python and can be used on multiple platforms.


Method details
Methods for combining multiple single scene satellite imagery have been used for many decades [1] . The, early methods consist of merging a few selected scenes [1] . However, multiple scenes are used as more high-resolution satellite imagery has become freely available, e.g., from the Landsat archive [2] and as imagery from the Copernicus Sentinel-2 satellites [3] . As a result, many mosaics e.g., [ 4 , 5 ] and tools to create such exists e.g. [6][7][8][9][10] . Mosaicking refers to combining two or more spatially independent images into one continuous image, while compositing refers to combining multiple overlapping images into one using an aggregation function [11] . The current method focuses on combining several independent images without any aggregation function; thus, we refer to the method as mosaicking. Some of the existing mosaics are only static and application-specific tuning is impossible [ 4 , 5 ]. Other methods may not provide pixel-level metadata that can be used in modeling, classification or stratification [ 8 , 10 ]. For example, when creating data mosaics based on aggregating pixel values from the same locations e.g., averaging methods such metadata are not available [8] . However, such mosaics may be appropriate for certain applications, but in other applications metadata may be desired. Furthermore, many of the mentioned mosaicking tools only work on a specific type of satellite data as Landsat 8 or Sentinel-2 data [6] . At last, mosaicking tools could be proprietary or connected to a specific software or cloud platform [9] . Based on these issues, we established a generic mosaicking method working on both Sentinel-2 and Landsat 4-8. Thus, we developed the Geomosaic method that implements a configurable smart pixel selection method that lets the user adjust the pixel selection criteria to fit the particular use case. The Geomosaic method is implemented in a software tool developed in Python and is available at Gitlab under a MIT license.
The Geomosaic tool has specific implementations for: (1) Sentinel-2 surface reflectance data (L2A), (2) Landsat 8 Collection 1 Surface Reflectance data (LaSRC -OLI/TIRS) and (3) Landsat 4-7 Collection 1 Surface Reflectance data (LEDAPS -TM/ETM + ). Spatiotemporal data aggregation is performed by (1) sorting the input scenes by date, (2) reprojecting the scene to the desired target projection if needed, (3) calculating the spatial extents of potential contribution from the scene, and (4) generating a pixel selection mask at each step. The pixel selection mask is established using the Geomosaic method described below.
The Geomosaic method uses a two-step pixel selection procedure based on a class score and a desirability score. Thus, for each pixel in the mosaic pixels are updated based on the pixel selection mask created from these two score values. If the new scenes' pixel has a larger class score or the same class score, but a larger desirability score the pixel values for all bands in the mosaic are updated ( Fig. 1 ).
In the first step in the Geomosaic method, the class score is established by selecting pixels according to a class categorization, where pixels from 'worse' classes are progressively exchanged for 'better' classes, according to a class hierarchy ( Table 1 ). The classes used in the hierarchy for Sentinel-2 was adapted from the Sentinel 2 MSI Technical Guide [12] . For Landsat 4-8 the classes described in the Land Surface Reflectance Code Product Guide V3.0 [13] was used. The class hierarchy and respective class scores used was fixed in the Geomosaic tool ( Table 1 ). Nevertheless, other classes and another hierarchy of the classes can be used with the Geomosaic method.
In the second step, a weighted pixel desirability score is calculated for each scene: where, describes the relative distance from the desired mosaic target date, is a ratio of good pixels in the scene against all pixels and describes the mean aerosol optical thickness of the scene, and , , are relative weights that are summarized to one:

Table 1
Class hierarchy adopted to compute class score and pixel quality. A larger class score means a better pixel quality, i.e., a score of 1 is the worse pixels while a score of six or seven is the best quality for Sentinel-2 or Landsat, respectively.
These weights can be changed in order to adapt the mosaic to a specific problem or application, e.g., by increasing the value of to have input images closer to the target date or increase the value of to have a higher spatial homogeneity of the mosaic or alternatively increase the atmospheric clarity of the mosaic by increasing . The variable is a value between 0 and 1 in a quasi-Gaussian distribution: where, D target describes the target date for the mosaic, D describes the date of the current scene and 2 is the standard deviation in days. The value used for 2 in the Geomosaic tool was 30 days. The value of is one if the image is from the desired date and will decrease as the difference between the target date and the scene date increases. Images from + /-30 days are strongly preferred over other dates due to the selected standard deviation ( Fig. 2 ).
The ratio of good pixels in the scene against all pixels is computed as: Fig. 2. The value of S date based on the difference between the target date and the date of the input scene ( 2 = 30). where, n is the total number of pixels in the scene and takes the value 1 if the pixel i is classified as good and 0 if not classified as good. The pixel is classified as good if the pixel has the highest class score possible for the given satellite data type and typically denoting a cloud, snow, and artefact free pixel (i.e., class score 6 for Sentinel-2 and class score 7 for Landsat, see Table 1 ).
The mean aerosol optical thickness of the scene is described with and calculated as follows: where, is the "aerosol optical thickness " of pixel i , and n is the number of pixels. When performing the spatiotemporal data aggregation of the sorted input scenes, all pixels in the mosaic with a smaller class score than the potential new scene are replaced. Furthermore, for pixels who have the highest possible class score in both the input scene and the mosaic, the pixel is exchanged only if the desirability score for the input scene S des is higher than the S des of the pixel currently in the mosaic. In addition to the mosaic, the algorithms create a tilemap that contains a metadata identifier to the original scene. In the Geomosaic tool provided, this value is stored as an unsigned integer, and the metadata is stored in an output GeoTiff file. Thus, each pixel in the mosaic is easily traceable back to the original input scene from which it was selected.
The Geomosaic method is applicable to several use cases and types of satellites data and provides a mosaicking method that enables user-specific inputs and provides analysis-ready satellite data mosaics. See examples of tile maps and mosaics in Fig. 3 .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
No data was used for the research described in the article.